reference letter
This Candidate is [MASK]. Letters of Reference and Job Market Outcomes using LLMs
I implement a prompt-based learning strategy to extract measures of sentiment and other features from confidential reference letters. I show that the contents of reference letters is clearly reflected in the performance of job market candidates in the Economics academic job market. In contrast, applying traditional ``bag-of-words'' approaches produces measures of sentiment that, while positively correlated to my LLM-based measure, are not predictive of job market outcomes. Using a random forest, I show that both letter quality and length are predictive of success in the job market. Letters authored by advisers appear to be as important as those written by other referees.
White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs
Language agency is an important aspect of evaluating social biases in texts. While several studies approached agency-related bias in human-written language, very limited research has investigated such biases in Large Language Model (LLM)-generated content. In addition, previous research often relies on string-matching techniques to identify agentic and communal words within texts, which fall short of accurately classifying language agency. We introduce the novel Language Agency Bias Evaluation (LABE) benchmark, which comprehensively evaluates biases in LLMs by analyzing agency levels attributed to different demographic groups in model generations. LABE leverages 5,400 template-based prompts, an accurate agency classifier, and corresponding bias metrics to test for gender, racial, and intersectional language agency biases in LLMs on 3 text generation tasks: biographies, professor reviews, and reference letters. To build better and more accurate automated agency classifiers, we also contribute and release the Language Agency Classification (LAC) dataset, consisting of 3,724 agentic and communal sentences. Using LABE, we unveil previously under-explored language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral. We observe that: (1) For the same text category, LLM generations demonstrate higher levels of gender bias than human-written texts; (2) On most generation tasks, models show remarkably higher levels of intersectional bias than the other bias aspects. Those who are at the intersection of gender and racial minority groups -- such as Black females -- are consistently described by texts with lower levels of agency; (3) Among the 3 LLMs investigated, Llama3 demonstrates greatest overall bias in language agency; (4) Not only does prompt-based mitigation fail to resolve language agency bias in LLMs, but it frequently leads to the exacerbation of biases in generated texts.
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters
Wan, Yixin, Pu, George, Sun, Jiao, Garimella, Aparna, Chang, Kai-Wei, Peng, Nanyun
Large Language Models (LLMs) have recently emerged as an effective tool to assist individuals in writing various types of content, including professional documents such as recommendation letters. Though bringing convenience, this application also introduces unprecedented fairness concerns. Model-generated reference letters might be directly used by users in professional scenarios. If underlying biases exist in these model-constructed letters, using them without scrutinization could lead to direct societal harms, such as sabotaging application success rates for female applicants. In light of this pressing issue, it is imminent and necessary to comprehensively study fairness issues and associated harms in this real-world use case. In this paper, we critically examine gender biases in LLM-generated reference letters. Drawing inspiration from social science findings, we design evaluation methods to manifest biases through 2 dimensions: (1) biases in language style and (2) biases in lexical content. We further investigate the extent of bias propagation by analyzing the hallucination bias of models, a term that we define to be bias exacerbation in model-hallucinated contents. Through benchmarking evaluation on 2 popular LLMs- ChatGPT and Alpaca, we reveal significant gender biases in LLM-generated recommendation letters. Our findings not only warn against using LLMs for this application without scrutinization, but also illuminate the importance of thoroughly studying hidden biases and harms in LLM-generated professional documents.
The End of Recommendation Letters
I was lunching with a group of fellow professors, and, as happens these days when we assemble, generative artificial intelligence was discussed. Are your students using it? What are you doing to prevent cheating? Heads were shaken in chagrin as iced teas were sipped for comfort. But then, one of my colleagues wondered: Could he use AI to generate a reference letter for a student?
Tenure Track Position in Artificial Intelligence - Competition No. A105033663
The Department of Computing Science at the University of Alberta invites applications for tenure-track or tenured faculty positions at all levels. Candidates with a strong research record in the area of Artificial Intelligence (AI), in particular (but not limited to) Machine Learning, Natural Language Processing, Computer Games, Visualization, Security, Planning/Heuristic Search, and Algorithmic Game Theory, will be considered for this position. According to csrankings.org the department is ranked #1 in Canada and averaged #3 in the world in terms of number of publications at top AI venues in the last 10 years, and it is also home to Amii (www.amii.ca), the Alberta Machine Intelligence Institute, formerly known as AICML. It is noteworthy that the 2017 Government of Canada Budget included an investment of $125 million into a Pan-Canadian Artificial Intelligence Strategy which features a major investment in research at the University of Alberta. According to the most recent Times Higher Education World University ranking, the department is ranked 3rd in Canada and 67th in the world.